-
Notifications
You must be signed in to change notification settings - Fork 1
123 check building cons area data #124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
…england building conservation areas dataset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Roisin, nice stuff!
I left some comments in the notebook. Overall the analysis is okay, I think one of the joins needs fixing, and a bit more commenting might've helped me track. I've put questions where I had some issues.
However, overall, the results seem to indicate we shouldn't be dropping. I thought it would be good to just go through and check the two most "off" local authorities.
Stratford-on-Avon is just missing loads of data compared in the Historic England dataset, as you note in your analysis. I looked at their website and just can't seem to find any of the conservation areas they list, when searching by name. Oxford Canal is weirdly in the HE data, but isn't on their site, but the other area in the dataset we have, Whichford, is in there. Seems not much we can do on this missingness issue?
However, I checked out as well Derbyshire Dales. This is where we have too many options compared to the Derbyshire Dales reported conservation areas. Some manual investigation, and these extras are actually conservation areas within the Peak District National Park!
Could this be a common driver of differences? LAs may not be responsible for management of conservation areas that fall under a national park designation, but that isn't reflected in the HE/Wales data?
On duplicated names, I think there are a few different options that you outline! I checked a few by spot, and a lot seem to be contiguous areas stored as separate shapes. However there are some that are not next to each other. Not sure if you'd be able to fully validate these. These are probably cases where it's difficult to automatically do anything, and we wouldn't necessarily want to with more confidence. The problem case would be ones like I saw in my GitHub notifications where an entire area is classed as a conservation area.
You could check that there are no conservation area polygons fully contained in another. Could do a spatial join, where you expect each area to only contain itself, and flag others to check. Those would be ones where maybe we manually check and if agree, automatically remove, so as not to overestimate conservation area through some wild setup! Maybe @lizgzil already checked for this in #130?
counts_df = pl.read_csv( | ||
"s3://asf-heat-pump-suitability/evaluation/building_conservation_area_datasets/building_conservation_area_counts_sample.csv" | ||
) | ||
cons_areas_df = cons_areas_df.join(counts_df, how="inner", on="LAD23NM") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is dropping 3 rows where we have no conservation areas: Wakefield, Wokingham, and Leeds. Not sure if intentional, but if not, should do an outer
join and fill in in_conservation_area_ew
with 0 I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These seem to be list in the initial filtering on overlay? Or at least some of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great spot thank you! Yes you are correct they are lost in the filtering step. Whenever I remove the filtering, I can see they are all retained. Between them they get matched to 11 conservation areas with all but 1 match being <1% of the conservation area. The other is 2.96%.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the new condition for filtering (keeping conservation area overlays over 50m2), I can see this is no longer an issue and the three areas you mentioned are now included.
Also, I thought about instances where there are 0 conservation areas joined. I think those rows should be excluded anyway, because that's clearly erroneous - every LA should have at least 1 conservation area, so we can remove LADs with counts of 0 from this analysis.
asf_heat_pump_suitability/analysis/protected_areas/20250207_missing_protected_areas.py
Show resolved
Hide resolved
asf_heat_pump_suitability/analysis/protected_areas/20250207_missing_protected_areas.py
Outdated
Show resolved
Hide resolved
asf_heat_pump_suitability/analysis/protected_areas/20250207_missing_protected_areas.py
Outdated
Show resolved
Hide resolved
asf_heat_pump_suitability/analysis/protected_areas/20250207_missing_protected_areas.py
Outdated
Show resolved
Hide resolved
|
||
# %% | ||
full_cons_areas_gdf["cons_area_size_m2"] = full_cons_areas_gdf["geometry"].area | ||
full_cons_areas_gdf = gpd.overlay( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, just noticed that because we are doing the intersection, we are dropping over 400 rows from the original data. A lot of this in places like Angelsey where I can see Beaumaris and the Menai Bridge, but also some villages in England as I spot check. I guess not to much to do since we are validating against the boundaries we have, but interesting as it could be another way we are losing data relative to boundaries that we do have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the code. There are 9 conservation areas which are dropped during the join from the original data. I can see they are dropped after the initial overlay so there must not be any LAD boundary data for the areas where they are. I think this is ok to exclude from our analysis as it's such a small percentage of the conservation areas.
Fixes #123
Description
Instructions for Reviewer
In order to test the code in this PR you need to ...
In terminal, run the following line:
jupytext --to notebook asf_heat_pump_suitability/analysis/protected_areas/20250207_missing_protected_areas.py
Then you can run the notebook.
Please pay special attention to ...
in_protected_area
withFalse
. I think this analysis shows we could, although we would have to accept that there may be some missing conservation areas for some LAs. Let me know what you think.Checklist:
notebooks/
pre-commit
and addressed any issues not automatically fixeddev
README
s